Skip to content

3619: perf: Optimize some decimal expressions#42

Open
martin-augment wants to merge 4 commits intomainfrom
pr-3619-2026-03-05-09-24-01
Open

3619: perf: Optimize some decimal expressions#42
martin-augment wants to merge 4 commits intomainfrom
pr-3619-2026-03-05-09-24-01

Conversation

@martin-augment
Copy link
Copy Markdown
Owner

3619: To review by AI

Replace the 4-node expression tree (Cast→BinaryExpr→Cast→Cast) used for
Decimal128 arithmetic that may overflow with a single fused expression
that performs i256 register arithmetic directly. This reduces per-batch
allocation from 4 intermediate arrays (112 bytes/elem) to 1 output array
(16 bytes/elem).

The new WideDecimalBinaryExpr evaluates children, performs add/sub/mul
using i256 intermediates via try_binary, applies scale adjustment with
HALF_UP rounding, checks precision bounds, and outputs a single
Decimal128 array. Follows the same pattern as decimal_div.
Add benchmark comparing old Cast->BinaryExpr->Cast chain vs fused
WideDecimalBinaryExpr for Decimal128 add/sub/mul. Covers four cases:
add with same scale, add with different scales, multiply, and subtract.
Eliminate redundant CheckOverflow when wrapping WideDecimalBinaryExpr
(which already handles overflow). Fuse Cast(Decimal128→Decimal128) +
CheckOverflow into a single DecimalRescaleCheckOverflow expression that
rescales and validates precision in one pass.
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 5, 2026

Walkthrough

This change introduces two new fused decimal expression types for Spark decimal arithmetic: WideDecimalBinaryExpr for handling Decimal128 binary operations (add, subtract, multiply) with i256 intermediate computation, and DecimalRescaleCheckOverflow for performing rescale and precision overflow checks in a single pass. The planner is modified to construct these expressions instead of using intermediate Decimal256 casts when extended precision is needed. A new benchmark file compares performance between the old cast-based path and the new fused approach. Public exports are updated to expose these new types and associated enums through the library's public API.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pr-3619-2026-03-05-09-24-01

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the performance of Decimal128 data type operations by introducing specialized, fused expressions for arithmetic and overflow checking. These changes streamline the execution plan, reducing computational overhead and improving efficiency for decimal-heavy workloads, particularly where intermediate precision requirements are high.

Highlights

  • Optimized Decimal128 Arithmetic: Introduced WideDecimalBinaryExpr to perform Decimal128 addition, subtraction, and multiplication using i256 intermediate arithmetic, replacing a less efficient multi-step casting and binary expression chain for operations requiring wider precision.
  • Fused Decimal Rescale and Overflow Check: Added DecimalRescaleCheckOverflow to combine decimal rescaling and precision overflow checking into a single, optimized expression, eliminating redundant CheckOverflow operations after Cast for Decimal128 types.
  • Planner Integration: The PhysicalPlanner now intelligently applies these new fused expressions, skipping redundant checks and optimizing the construction of the physical plan for Decimal128 operations.
  • Performance Benchmarks: Included new benchmarks to compare the performance of the old and new decimal expression implementations, demonstrating the efficiency gains from these optimizations.
Changelog
  • native/core/src/execution/planner.rs
    • Imported new DecimalRescaleCheckOverflow, WideDecimalBinaryExpr, and WideDecimalOp expressions.
    • Modified CheckOverflow expression creation to fuse Cast and CheckOverflow into DecimalRescaleCheckOverflow for Decimal128 types.
    • Updated binary expression planning to use WideDecimalBinaryExpr for Decimal128 operations that exceed DECIMAL128_MAX_PRECISION, optimizing intermediate precision handling.
  • native/spark-expr/Cargo.toml
    • Added a new benchmark target named wide_decimal.
  • native/spark-expr/benches/wide_decimal.rs
    • Added a new benchmark file to compare the performance of the old multi-step decimal arithmetic against the new WideDecimalBinaryExpr.
  • native/spark-expr/src/lib.rs
    • Exported DecimalRescaleCheckOverflow, WideDecimalBinaryExpr, and WideDecimalOp from the math_funcs module.
  • native/spark-expr/src/math_funcs/internal/decimal_rescale_check.rs
    • Added a new module defining the DecimalRescaleCheckOverflow struct, which fuses decimal rescaling and overflow checking.
  • native/spark-expr/src/math_funcs/internal/mod.rs
    • Exported DecimalRescaleCheckOverflow.
  • native/spark-expr/src/math_funcs/mod.rs
    • Added wide_decimal_binary_expr module.
    • Exported WideDecimalBinaryExpr and WideDecimalOp.
  • native/spark-expr/src/math_funcs/wide_decimal_binary_expr.rs
    • Added a new module defining the WideDecimalBinaryExpr struct for optimized Decimal128 binary operations using i256 intermediate arithmetic.
Activity
  • The pull request description indicates that it is 'To review by AI', suggesting an automated review process is expected or underway.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Ok(ColumnarValue::Scalar(ScalarValue::Decimal128(
new_v, p_out, s_out,
)))
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scalar path silently swallows errors in ANSI mode

High Severity

In the scalar evaluation path of DecimalRescaleCheckOverflow, when fail_on_error is true (ANSI mode), rescale_and_check correctly returns an Err, but .ok() on line 207 silently converts it to None. This causes the scalar path to return a null value instead of propagating the overflow error. The array path correctly propagates errors via ? on try_unary. This means ANSI-mode overflow errors are silently swallowed for scalar inputs, producing null instead of an error.

Fix in Cursor Fix in Web

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The Bugbot AI reviewer is correct! The call to .ok() will hide any Err by silently converting it to None, and that None will lead to retuning SQL NULL result. Prevents a bug where an error is silently suppressed

.is_some()
{
return Ok(child);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CheckOverflow skip ignores its own data_type and fail_on_error

Medium Severity

When CheckOverflow wraps a WideDecimalBinaryExpr, the code returns the child directly without verifying that CheckOverflow's data_type (precision/scale) matches the WideDecimalBinaryExpr's output type. If Spark's plan specifies a different precision/scale in CheckOverflow than the binary expression's return_type, the output type would be wrong. The fail_on_error flag from CheckOverflow is also silently discarded.

Fix in Cursor Fix in Web

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The Bugbot AI reviewer is correct! Before fusing (by using WideDecimalBinaryExpr) the logic should check that the Cast's precision/scale match the requested output precision/scale pair. Prevents calculating wrong rescaling if they don't match

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant performance optimizations for decimal arithmetic expressions by fusing multiple operations into single, more efficient expressions: Cast and CheckOverflow are combined into DecimalRescaleCheckOverflow, and the chain of Cast -> BinaryExpr -> Cast for wide decimal operations is replaced by a new WideDecimalBinaryExpr. While the overall approach is excellent and includes comprehensive unit tests and benchmarks, it introduces potential Denial of Service (DoS) vulnerabilities via executor panics and Data Integrity issues. This is due to new code paths lacking sufficient validation for extreme scale and precision values provided in the Spark plan, which can lead to incorrect scaling factors. These security concerns should be addressed by using checked arithmetic or adding explicit input validation for decimal parameters. Additionally, there are a couple of minor suggestions to improve code clarity and idiomatic usage in the new implementations.

let arg = self.child.evaluate(batch)?;
let delta = self.output_scale - self.input_scale;
let abs_delta = delta.unsigned_abs();
let scale_factor = 10i128.pow(abs_delta as u32);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The evaluate method calculates scale_factor using 10i128.pow(abs_delta as u32). Since abs_delta is derived from the difference of two i8 scales provided in the Spark plan, it can exceed 38 (the maximum power of 10 that fits in a signed 128-bit integer). For example, if output_scale is 38 and input_scale is -1, abs_delta would be 39, causing a panic and crashing the executor. This represents a Denial of Service (DoS) risk if an attacker can provide a Spark plan with extreme scale values.

Recommendation: Use checked_pow and handle the overflow by returning an error or null, or validate the scales during the planning phase.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The Gemini AI reviewer is correct! Apache Arrow supports decimals with negative scales. The subtraction of the scales here could lead to a bigger result (x - (-y) == x + y) that could lead to overflowing the maximum allowed scale of 38. Checks should be added to prevent such overflows and/or panics.

/// Maximum absolute value for a given decimal precision: 10^p - 1.
#[inline]
fn precision_bound(precision: u8) -> i128 {
10i128.pow(precision as u32) - 1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The precision_bound helper function uses 10i128.pow(precision as u32). If the output_precision (a u8 value from the Spark plan) is greater than 38, this call will panic. While Spark typically limits decimal precision to 38, Comet should validate this input to avoid crashing the executor on malformed or malicious plans.

Recommendation: Validate that precision is within the supported range (1-38) before calling pow, or use checked_pow.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The Gemini AI reviewer is correct! Apache Arrow supports decimals with negative scales. The subtraction of the scales here could lead to a bigger result (x - (-y) == x + y) that could lead to overflowing the maximum allowed scale of 38. Checks should be added to prevent such overflows and/or panics.

Comment on lines +151 to +157
fn i256_pow10(exp: u32) -> i256 {
let mut result = i256::ONE;
for _ in 0..exp {
result = result.wrapping_mul(I256_TEN);
}
result
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The i256_pow10 function uses wrapping_mul to compute scaling factors. If the exponent exp exceeds 77 (the maximum power of 10 that fits in a 256-bit integer), the result will overflow and wrap around, leading to incorrect scaling factors. This can occur if the difference between decimal scales is extreme. Subsequent arithmetic operations using these incorrect factors will produce wrong results, posing a data integrity risk.

Recommendation: Use checked arithmetic or validate that the exponent is within the safe range (0-77) before performing the calculation.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The Gemini AI reviewer is correct! A check should be added to prevent trying to use exponents bigger than 77. Otherwise the multiplication operation will wrap and lead to wrong calculations.

Comment on lines +705 to +711
Ok(DataType::Decimal128(_p1, _s1)),
Ok(DataType::Decimal128(_p2, _s2)),
) if ((op == DataFusionOperator::Plus || op == DataFusionOperator::Minus)
&& max(s1, s2) as u8 + max(p1 - s1 as u8, p2 - s2 as u8)
&& max(_s1, _s2) as u8 + max(_p1 - _s1 as u8, _p2 - _s2 as u8)
>= DECIMAL128_MAX_PRECISION)
|| (op == DataFusionOperator::Multiply && p1 + p2 >= DECIMAL128_MAX_PRECISION) =>
|| (op == DataFusionOperator::Multiply
&& _p1 + _p2 >= DECIMAL128_MAX_PRECISION) =>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The variables _p1, _s1, _p2, and _s2 are prefixed with an underscore, which typically indicates they are unused. However, they are used in the if guard of this match arm. This can be confusing for future readers. It would be clearer to remove the underscore prefix from these variable names to signal that they are intentionally used.

Suggested change
Ok(DataType::Decimal128(_p1, _s1)),
Ok(DataType::Decimal128(_p2, _s2)),
) if ((op == DataFusionOperator::Plus || op == DataFusionOperator::Minus)
&& max(s1, s2) as u8 + max(p1 - s1 as u8, p2 - s2 as u8)
&& max(_s1, _s2) as u8 + max(_p1 - _s1 as u8, _p2 - _s2 as u8)
>= DECIMAL128_MAX_PRECISION)
|| (op == DataFusionOperator::Multiply && p1 + p2 >= DECIMAL128_MAX_PRECISION) =>
|| (op == DataFusionOperator::Multiply
&& _p1 + _p2 >= DECIMAL128_MAX_PRECISION) =>
Ok(DataType::Decimal128(p1, s1)),
Ok(DataType::Decimal128(p2, s2)),
) if ((op == DataFusionOperator::Plus || op == DataFusionOperator::Minus)
&& max(s1, s2) as u8 + max(p1 - s1 as u8, p2 - s2 as u8)
>= DECIMAL128_MAX_PRECISION)
|| (op == DataFusionOperator::Multiply
&& p1 + p2 >= DECIMAL128_MAX_PRECISION) =>

// divisor = 10^(-delta), half = divisor / 2
let divisor = scale_factor; // already 10^abs(delta)
let half = divisor / 2;
let sign = if value < 0 { -1i128 } else { 1i128 };
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For improved readability and idiomatic Rust, consider using value.signum() to get the sign of the number. While the current implementation is correct, signum() is the standard library function for this purpose.

Suggested change
let sign = if value < 0 { -1i128 } else { 1i128 };
let sign = value.signum();

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:bug; feedback: The Gemini AI reviewer is correct! The i128 type has a signum() method that returns a similar result. The only difference is that it returns 0 for 0, not 1 as the current code.

@claude
Copy link
Copy Markdown

claude bot commented Mar 5, 2026

Code Review: Optimize Decimal Expressions

This PR introduces WideDecimalBinaryExpr and DecimalRescaleCheckOverflow to eliminate redundant casts and intermediate arrays in decimal arithmetic. The intent is sound and the benchmark is a nice addition. I have some correctness concerns that should be addressed.

Critical / Correctness

  1. CheckOverflow bypass ignores outer type constraints (planner.rs) - The optimization skips CheckOverflow without verifying that its requested data_type (precision and scale) matches what WideDecimalBinaryExpr already produces. Both nodes come from the same Spark plan node now, but this is fragile. A future Spark version that changes precision in an intermediate step would silently return the wrong type. At minimum add a debug assertion comparing the outer data_type with wide.data_type(&input_schema) before the early return.

  2. s_out > max_scale is unhandled in WideDecimalBinaryExpr add/subtract path - need_rescale = s_out < max_scale is false when s_out >= max_scale. If s_out > max_scale, the raw result has scale max_scale but the output array is tagged with s_out, making every value off by 10^(s_out - max_scale). Sparks promotion rules generally keep s_out <= max_scale, but without a runtime guard a future planner change could silently corrupt output. Add debug_assert!(s_out <= max_scale) or handle the scale-up branch. Same concern applies to the multiply path when s_out > natural_scale.

  3. Potential i128 overflow in rescale_and_check scale-down (decimal_rescale_check.rs) - The expression value + sign * half can overflow i128 when value is near i128::MIN and sign is -1, before the precision bounds check runs. A safe restructuring: let (quot, rem) = (value / divisor, value % divisor); let round_up = rem.abs() * 2 >= divisor.abs(); quot + if round_up { value.signum() } else { 0 }

Code Quality

  1. Variables _p1/_s1/_p2/_s2 are actually used in the match guard (planner.rs) - The underscore prefix conventionally signals unused bindings, but these are used in the match guard. Rename to p1, s1, p2, s2.

  2. with_new_children can panic with wrong child count - Both DecimalRescaleCheckOverflow::with_new_children and WideDecimalBinaryExpr::with_new_children index into children without bounds checks. Add a length assertion or a SAFETY comment for consistency with other PhysicalExpr implementations.

  3. Redundant comment in rescale_and_check - There is a leftover comment that restates the assignment on the line directly above it. Remove the duplicate.

Minor

  1. No test for EvalMode::Try in WideDecimalBinaryExpr - EvalMode::Try is treated identically to EvalMode::Legacy (both return null on overflow) but no test covers this mode. Add at least one test.

  2. Benchmark boundary comment - The comment states the sum equals 38 >= 38. Using precision values that produce a sum of 39 would make the need for wide arithmetic unambiguous.

Summary: 3 critical correctness issues, 3 code quality items, 2 minor notes. Items 1-3 are the most important to address before merging. The benchmark is a nice addition that makes the performance trade-off clear and reproducible.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@native/core/src/execution/planner.rs`:
- Around line 395-409: The fusion replaces a Cast+CheckOverflow pair without
verifying the Cast's output type; update the precondition in the block that
matches child.as_any().downcast_ref::<Cast>() so it also checks that
cast.data_type(&input_schema) equals the target DataType::Decimal128(p_out,
s_out) (i.e., the same precision/scale as the CheckOverflow target) before
creating DecimalRescaleCheckOverflow; locate the matching logic around data_type
and cast.child.data_type(&input_schema) and add the extra equality check on
cast.data_type(&input_schema) to tighten the fusion precondition.

In `@native/spark-expr/src/math_funcs/internal/decimal_rescale_check.rs`:
- Around line 204-213: The scalar Decimal128 branch is currently calling
rescale_and_check(...).ok() which swallows errors and turns ANSI overflow (when
fail_on_error is true) into a null; change this to propagate errors instead of
dropping them: replace the .ok().and_then(...) chain in the
ColumnarValue::Scalar(ScalarValue::Decimal128(...)) arm with a match on
rescale_and_check(...) that returns Err(e) when the call returns Err(e), and for
Ok(r) map r == i128::MAX to None (legacy null) or Some(r) otherwise, then
construct the Decimal128 scalar with that result; reference rescale_and_check,
ColumnarValue::Scalar, and ScalarValue::Decimal128 to locate the code to update.
- Around line 224-234: The with_new_children method currently indexes
children[0] and can panic for the wrong arity; update with_new_children (on
DecimalRescaleCheckOverflow) to first validate children.len() == 1 and return an
Err(datafusion::common::Result::Err(...)) when the count is wrong instead of
indexing directly; then construct the new Arc by using the validated children[0]
(or children.get(0).unwrap()) and call DecimalRescaleCheckOverflow::new with the
same self.input_scale, self.output_precision, self.output_scale, and
self.fail_on_error values.

In `@native/spark-expr/src/math_funcs/wide_decimal_binary_expr.rs`:
- Around line 272-283: The with_new_children implementation for
WideDecimalBinaryExpr assumes exactly two children and can panic; update
with_new_children to validate children.len() == 2 and return a proper Err when
the arity is incorrect instead of indexing into children. Locate the method
named with_new_children on WideDecimalBinaryExpr and before calling
WideDecimalBinaryExpr::new, check the number of children and return an
appropriate Result error (preserving existing error type used in this crate)
when it is not 2; otherwise construct the new instance using
Arc::clone(&children[0]) and Arc::clone(&children[1]) with the same self.op,
self.output_precision, self.output_scale, and self.eval_mode.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f1cf588a-02c2-4f3b-b579-9c9e0ea0b269

📥 Commits

Reviewing files that changed from the base of the PR and between 0a8b2e9 and 486eda2.

📒 Files selected for processing (8)
  • native/core/src/execution/planner.rs
  • native/spark-expr/Cargo.toml
  • native/spark-expr/benches/wide_decimal.rs
  • native/spark-expr/src/lib.rs
  • native/spark-expr/src/math_funcs/internal/decimal_rescale_check.rs
  • native/spark-expr/src/math_funcs/internal/mod.rs
  • native/spark-expr/src/math_funcs/mod.rs
  • native/spark-expr/src/math_funcs/wide_decimal_binary_expr.rs

Comment on lines +395 to +409
if let Some(cast) = child.as_any().downcast_ref::<Cast>() {
if let (
DataType::Decimal128(p_out, s_out),
Ok(DataType::Decimal128(_p_in, s_in)),
) = (&data_type, cast.child.data_type(&input_schema))
{
return Ok(Arc::new(DecimalRescaleCheckOverflow::new(
Arc::clone(&cast.child),
s_in,
*p_out,
*s_out,
fail_on_error,
)));
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Tighten fusion precondition by validating cast output type too.

The fusion currently validates only cast.child input type. Please also require the Cast output type to match the CheckOverflow target decimal type before replacing the pair.

🛠️ Suggested fix
                 if let Some(cast) = child.as_any().downcast_ref::<Cast>() {
                     if let (
                         DataType::Decimal128(p_out, s_out),
                         Ok(DataType::Decimal128(_p_in, s_in)),
-                    ) = (&data_type, cast.child.data_type(&input_schema))
+                        Ok(DataType::Decimal128(cast_p, cast_s)),
+                    ) = (
+                        &data_type,
+                        cast.child.data_type(&input_schema),
+                        cast.data_type(&input_schema),
+                    )
                     {
-                        return Ok(Arc::new(DecimalRescaleCheckOverflow::new(
-                            Arc::clone(&cast.child),
-                            s_in,
-                            *p_out,
-                            *s_out,
-                            fail_on_error,
-                        )));
+                        if cast_p == *p_out && cast_s == *s_out {
+                            return Ok(Arc::new(DecimalRescaleCheckOverflow::new(
+                                Arc::clone(&cast.child),
+                                s_in,
+                                *p_out,
+                                *s_out,
+                                fail_on_error,
+                            )));
+                        }
                     }
                 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@native/core/src/execution/planner.rs` around lines 395 - 409, The fusion
replaces a Cast+CheckOverflow pair without verifying the Cast's output type;
update the precondition in the block that matches
child.as_any().downcast_ref::<Cast>() so it also checks that
cast.data_type(&input_schema) equals the target DataType::Decimal128(p_out,
s_out) (i.e., the same precision/scale as the CheckOverflow target) before
creating DecimalRescaleCheckOverflow; locate the matching logic around data_type
and cast.child.data_type(&input_schema) and add the extra equality check on
cast.data_type(&input_schema) to tighten the fusion precondition.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The CodeRabbit AI reviewer is correct! Before fusing (by using WideDecimalBinaryExpr) the logic should check that the Cast's precision/scale match the requested output precision/scale pair. Prevents calculating wrong rescaling if they don't match

Comment on lines +204 to +213
ColumnarValue::Scalar(ScalarValue::Decimal128(v, _precision, _scale)) => {
let new_v = v.and_then(|val| {
rescale_and_check(val, delta, scale_factor, bound, fail_on_error)
.ok()
.and_then(|r| if r == i128::MAX { None } else { Some(r) })
});
Ok(ColumnarValue::Scalar(ScalarValue::Decimal128(
new_v, p_out, s_out,
)))
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Propagate ANSI overflow errors in scalar path (don’t silently null them).

The scalar branch currently uses .ok() on rescale_and_check, which drops errors and returns None. That makes fail_on_error = true behave like legacy nulling for scalars.

🛠️ Suggested fix
             ColumnarValue::Scalar(ScalarValue::Decimal128(v, _precision, _scale)) => {
-                let new_v = v.and_then(|val| {
-                    rescale_and_check(val, delta, scale_factor, bound, fail_on_error)
-                        .ok()
-                        .and_then(|r| if r == i128::MAX { None } else { Some(r) })
-                });
+                let new_v = match v {
+                    Some(val) => {
+                        let r = rescale_and_check(val, delta, scale_factor, bound, fail_on_error)?;
+                        if !fail_on_error && r == i128::MAX {
+                            None
+                        } else {
+                            Some(r)
+                        }
+                    }
+                    None => None,
+                };
                 Ok(ColumnarValue::Scalar(ScalarValue::Decimal128(
                     new_v, p_out, s_out,
                 )))
             }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ColumnarValue::Scalar(ScalarValue::Decimal128(v, _precision, _scale)) => {
let new_v = v.and_then(|val| {
rescale_and_check(val, delta, scale_factor, bound, fail_on_error)
.ok()
.and_then(|r| if r == i128::MAX { None } else { Some(r) })
});
Ok(ColumnarValue::Scalar(ScalarValue::Decimal128(
new_v, p_out, s_out,
)))
}
ColumnarValue::Scalar(ScalarValue::Decimal128(v, _precision, _scale)) => {
let new_v = match v {
Some(val) => {
let r = rescale_and_check(val, delta, scale_factor, bound, fail_on_error)?;
if !fail_on_error && r == i128::MAX {
None
} else {
Some(r)
}
}
None => None,
};
Ok(ColumnarValue::Scalar(ScalarValue::Decimal128(
new_v, p_out, s_out,
)))
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@native/spark-expr/src/math_funcs/internal/decimal_rescale_check.rs` around
lines 204 - 213, The scalar Decimal128 branch is currently calling
rescale_and_check(...).ok() which swallows errors and turns ANSI overflow (when
fail_on_error is true) into a null; change this to propagate errors instead of
dropping them: replace the .ok().and_then(...) chain in the
ColumnarValue::Scalar(ScalarValue::Decimal128(...)) arm with a match on
rescale_and_check(...) that returns Err(e) when the call returns Err(e), and for
Ok(r) map r == i128::MAX to None (legacy null) or Some(r) otherwise, then
construct the Decimal128 scalar with that result; reference rescale_and_check,
ColumnarValue::Scalar, and ScalarValue::Decimal128 to locate the code to update.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The CodeRabbit AI reviewer is correct! The call to .ok() will hide any Err by silently converting it to None, and that None will lead to retuning SQL NULL result. Prevents a bug where an error is silently suppressed

Comment on lines +224 to +234
fn with_new_children(
self: Arc<Self>,
children: Vec<Arc<dyn PhysicalExpr>>,
) -> datafusion::common::Result<Arc<dyn PhysicalExpr>> {
Ok(Arc::new(DecimalRescaleCheckOverflow::new(
Arc::clone(&children[0]),
self.input_scale,
self.output_precision,
self.output_scale,
self.fail_on_error,
)))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Validate child count in with_new_children to avoid panics.

Direct indexing with children[0] can panic if a malformed rewrite passes the wrong arity.

🛠️ Suggested fix
     fn with_new_children(
         self: Arc<Self>,
         children: Vec<Arc<dyn PhysicalExpr>>,
     ) -> datafusion::common::Result<Arc<dyn PhysicalExpr>> {
+        if children.len() != 1 {
+            return Err(DataFusionError::Internal(format!(
+                "DecimalRescaleCheckOverflow expects 1 child, got {}",
+                children.len()
+            )));
+        }
+
         Ok(Arc::new(DecimalRescaleCheckOverflow::new(
             Arc::clone(&children[0]),
             self.input_scale,
             self.output_precision,
             self.output_scale,
             self.fail_on_error,
         )))
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fn with_new_children(
self: Arc<Self>,
children: Vec<Arc<dyn PhysicalExpr>>,
) -> datafusion::common::Result<Arc<dyn PhysicalExpr>> {
Ok(Arc::new(DecimalRescaleCheckOverflow::new(
Arc::clone(&children[0]),
self.input_scale,
self.output_precision,
self.output_scale,
self.fail_on_error,
)))
fn with_new_children(
self: Arc<Self>,
children: Vec<Arc<dyn PhysicalExpr>>,
) -> datafusion::common::Result<Arc<dyn PhysicalExpr>> {
if children.len() != 1 {
return Err(DataFusionError::Internal(format!(
"DecimalRescaleCheckOverflow expects 1 child, got {}",
children.len()
)));
}
Ok(Arc::new(DecimalRescaleCheckOverflow::new(
Arc::clone(&children[0]),
self.input_scale,
self.output_precision,
self.output_scale,
self.fail_on_error,
)))
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@native/spark-expr/src/math_funcs/internal/decimal_rescale_check.rs` around
lines 224 - 234, The with_new_children method currently indexes children[0] and
can panic for the wrong arity; update with_new_children (on
DecimalRescaleCheckOverflow) to first validate children.len() == 1 and return an
Err(datafusion::common::Result::Err(...)) when the count is wrong instead of
indexing directly; then construct the new Arc by using the validated children[0]
(or children.get(0).unwrap()) and call DecimalRescaleCheckOverflow::new with the
same self.input_scale, self.output_precision, self.output_scale, and
self.fail_on_error values.

Comment on lines +272 to +283
fn with_new_children(
self: Arc<Self>,
children: Vec<Arc<dyn PhysicalExpr>>,
) -> Result<Arc<dyn PhysicalExpr>> {
Ok(Arc::new(WideDecimalBinaryExpr::new(
Arc::clone(&children[0]),
Arc::clone(&children[1]),
self.op,
self.output_precision,
self.output_scale,
self.eval_mode,
)))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add arity checks in with_new_children for panic safety.

This method assumes exactly two children and will panic on invalid rewrite inputs.

🛠️ Suggested fix
-use datafusion::common::Result;
+use datafusion::common::{DataFusionError, Result};
@@
     fn with_new_children(
         self: Arc<Self>,
         children: Vec<Arc<dyn PhysicalExpr>>,
     ) -> Result<Arc<dyn PhysicalExpr>> {
+        if children.len() != 2 {
+            return Err(DataFusionError::Internal(format!(
+                "WideDecimalBinaryExpr expects 2 children, got {}",
+                children.len()
+            )));
+        }
+
         Ok(Arc::new(WideDecimalBinaryExpr::new(
             Arc::clone(&children[0]),
             Arc::clone(&children[1]),
             self.op,
             self.output_precision,
             self.output_scale,
             self.eval_mode,
         )))
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fn with_new_children(
self: Arc<Self>,
children: Vec<Arc<dyn PhysicalExpr>>,
) -> Result<Arc<dyn PhysicalExpr>> {
Ok(Arc::new(WideDecimalBinaryExpr::new(
Arc::clone(&children[0]),
Arc::clone(&children[1]),
self.op,
self.output_precision,
self.output_scale,
self.eval_mode,
)))
use datafusion::common::{DataFusionError, Result};
fn with_new_children(
self: Arc<Self>,
children: Vec<Arc<dyn PhysicalExpr>>,
) -> Result<Arc<dyn PhysicalExpr>> {
if children.len() != 2 {
return Err(DataFusionError::Internal(format!(
"WideDecimalBinaryExpr expects 2 children, got {}",
children.len()
)));
}
Ok(Arc::new(WideDecimalBinaryExpr::new(
Arc::clone(&children[0]),
Arc::clone(&children[1]),
self.op,
self.output_precision,
self.output_scale,
self.eval_mode,
)))
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@native/spark-expr/src/math_funcs/wide_decimal_binary_expr.rs` around lines
272 - 283, The with_new_children implementation for WideDecimalBinaryExpr
assumes exactly two children and can panic; update with_new_children to validate
children.len() == 2 and return a proper Err when the arity is incorrect instead
of indexing into children. Locate the method named with_new_children on
WideDecimalBinaryExpr and before calling WideDecimalBinaryExpr::new, check the
number of children and return an appropriate Result error (preserving existing
error type used in this crate) when it is not 2; otherwise construct the new
instance using Arc::clone(&children[0]) and Arc::clone(&children[1]) with the
same self.op, self.output_precision, self.output_scale, and self.eval_mode.

@augmentcode
Copy link
Copy Markdown

augmentcode bot commented Mar 5, 2026

🤖 Augment PR Summary

Summary: This PR optimizes Spark-compatible decimal arithmetic execution by fusing common overflow/rescale patterns and avoiding extra intermediate arrays.

Changes:

  • Updates the native physical planner to emit a new fused WideDecimalBinaryExpr for Decimal128 add/sub/mul cases that require wider intermediate precision.
  • Skips redundant CheckOverflow wrapping when the child is already a wide-decimal expression that enforces overflow semantics.
  • Introduces DecimalRescaleCheckOverflow to fuse Decimal128→Decimal128 rescaling and precision overflow checking into a single pass.
  • Exports the new expressions from native/spark-expr for planner use.
  • Adds a Criterion benchmark (wide_decimal) comparing the old Cast→BinaryExpr→Cast chain vs the fused wide-decimal expression.

Technical Notes: The wide-decimal path performs i256 intermediate arithmetic with HALF_UP rounding, then applies precision bounds (null on overflow in Legacy/Try; error in ANSI).

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 2 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

let arg = self.child.evaluate(batch)?;
let delta = self.output_scale - self.input_scale;
let abs_delta = delta.unsigned_abs();
let scale_factor = 10i128.pow(abs_delta as u32);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scale_factor = 10i128.pow(abs_delta as u32) can overflow/wrap for larger scale deltas (e.g., casts involving negative scales), which would make the rescale+overflow check incorrect or panic in debug builds.

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The Augment AI reviewer is correct! Apache Arrow supports decimals with negative scales. The subtraction of the scales here could lead to a bigger result (x - (-y) == x + y) that could lead to overflowing the maximum allowed scale of 38. Checks should be added to prevent such overflows and/or panics.

}
ColumnarValue::Scalar(ScalarValue::Decimal128(v, _precision, _scale)) => {
let new_v = v.and_then(|val| {
rescale_and_check(val, delta, scale_factor, bound, fail_on_error)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the scalar path, rescale_and_check(...).ok() drops errors; if fail_on_error is true (ANSI), an overflow would become a NULL result instead of propagating an error.

Severity: high

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The Augment AI reviewer is correct! The call to .ok() will hide any Err by silently converting it to None, and that None will lead to retuning SQL NULL result. Prevents a bug where an error is silently suppressed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants